Energy Dynamics Laboratory - Visual Analytics of Epidemic Spread

VAST 2011 Challenge
Mini-Challenge 1 - Characterization of an Epidemic Spread

Authors and Affiliations:

Pranab Banerjee, Energy Dynamics Laboratory, Pranab.Banerjee@edl.usu.edu



Tool(s):

Provide a short description of the tool(s) you used. Mention where and when it was developed.
Additional credit to developers of the tools can be provided here, and links to find more information on the tool. 

If the tool used is a toolkit, rate the effort needed to customize the toolkit for this specific analysis. Consider such things as programming ability required, amount of time needed.

(250 words MAX)

 The tools used for this analysis are :

These are all either open source or freely available. The analysis was done entirely on a Linux PC running Ubuntu 10.04.2

Video:



AVI file

 

 

ANSWERS:


MC 1.1 Origin and Epidemic Spread: Identify approximately where the outbreak started on the map (ground zero location). If possible, outline the affected area. Explain how you arrived at your conclusion.

First, the file Microblogs.csv was filtered to extract messages containing symptoms. Daily histogram of these messages is shown in Figure 1.


Figure 1

Figure 1: Histogram of daily messages



From this figure it is clear that the outbreak started on Day 138 (May 18, 2011). The ground zero was identified by looking at the evolution of messages by the hour on May 18th. There was a spike in messages starting at 8 AM, indicating the outbreak. Figure 2. shows the location of these messages.

Fig 2

Figure 2: Location of messages  on May 18, 2011 between 8 and 9 AM


Figure 2 shows the ground zero to be downtown near the Vastopolis Dome, Vastopolis City Hospital, and the Convention Center.

The most affected areas are Downtown and the areas of Smogtown, Westside and Painville along the banks of the river, as marked in Figure 2 (elaborated in section MC 1.2 and also in the video).




MC 1.2 Epidemic Spread: Present a hypothesis on how the infection is being transmitted. For example, is the method of transmission person-to-person, airborne, waterborne, or something else? Identify the trends that support your hypothesis. Is the outbreak contained? Is it necessary for emergency management personnel to deploy treatment resources outside the affected area? Explain your reasoning. (Max 1000 words, and max 5 screen shots)

We hypothesize that the method of transmission isi person-to-person and waterborne but not airborne. This is based on the following observations.

The messages were separated by the hour intervals for each day of the epidemic, and their spatial locations plotted on the map. Figure 3 shows such a plot for May 18th between 8 AM and 9 AM. Compare this to Figure 4 which shows the locations of messages sent between 10 AM and 11 AM on the same day. The Vastopolis map has been reduced in contrast for this visualization so that both the red markers for the calls as well as the map features are recognizable.

fig 3

Figure 3: Spatial location of messages between 8 AM and 9 AM on May 18, 2011



fig 4

Figure 4: Spatial location of messages between 10 AM and 11 AM on May 18, 2011


Comparison of Figure 3 and 4 shows that the epidemic has spread from Downtown area to all directions, such as Northville, Cornertown, Lakeside, Suburbia etc. It is obvious from visual comparison that  the spread in the Eastward direction is more pronounced than in the Westward direction. This is also supported by numerical results of statistical analysis done on the Eastern and Western halves of the map. This shows that the human-to-human transmission is one of the ways the epidemic is spreading. This also shows that the epidemic is not spreading through airborne means since the weather data shows that the wind direction is Westward on May 18th.


Figures 5, 6, and 7 below show the spatial distribution of all messages sent on May 18th, 19th, and 20th respectively. These also confirm that the epidemic is not spreading through airborne means because if it were so, we would have expected a much higher density of cases in the Western half on May 19th (since the wind direction was Westbound on May 18th) and in the NNW direction from ground zero on May 20th since the wind was blowing in the NNW direction on May 19th. But Figures 6 and 7 do not show any such pattern.

 However, Figures 5, 6, and 7 do indicate increasing incidences along the banks of the river South of ground zero latitude, in Westside, Smogtown and Painville. This supports the hypothesis that the epidemic is spreading through water borne means, and is consistent with the fact that the rive flows South. The fact that this increasing density of cases along the banks of the river is not observed North of ground zero latitude also supports this hypothesis.

fig 5

Figure 5: Spatial distribution of all messages for May 18, 2001


Fig 6

Figure 6: Spatial distribution of all messages for May 19, 2001


Fig 7

Figure 7: Spatial distribution of all messages for May 20, 2001


Statistical analysis using Rapidminer showed that the total number of symptoms related messages sent on May 18th, 19th and 20th were 10184, 16030, and 13390 respectively. Even though the absolute number of messages on May 20th dropped by 16.47%, it is not significant enough to conclude the outbreak is contained, specially since it seems to have spread to a wide area. Also, hourly histogram of messages (refer to video) shows that the number of messages on May 20th was consistently high (around 600/hr) throughout the day - even  in the early hours like 2 AM and 3 AM. The number of messages in such early hours was lot less on May 19th and May 18th. This trend also supports the hypothesis that the outbreak is not contained.

Therefore, it is concluded that it is necessary for emergency management personnel to deploy treatment resources outside the affected area, since the regional spread of the outbreak is not contained.